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A method and apparatus for selecting a quantizer scale for 
each macroblock within a frame to optimize the coding rate 
is presented. A quantizer scale is selected for each macrob- 
lock within each frame such that the target bit rate for the 
frame is achieved while maintaining a uniform visual quality 
over the entire frame. 
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APPARATUS AND METHOD FOR However, macroblock level rate control is generally more 

MACROBLOCK BASED RATE CONTROL IN costly, since there is an additional overhead if quantization 

A CODING SYSTEM parameter (quantizer scale) is changed within a frame. 

Namely, more bits are needed to communicate to the 
This application claims the benefit of U.S. Provisional 5 decoder of the different quantizer scales for different mac- 
Application No. 60/052,437 filed Jul. 14, 1997, which is roblocks witriin each frame. This criticaUty is exacerbated in 
herein incorporated by reference. Iow bi , ra , c applications, where proper bit management is 

The present invention relates to an apparatus and con- v important 

comiiant method for optimizing the coding of motion video. _ e ' . . , 

More particularly, this invention relates to a method and 50 ™™ t( T> a Decd f 0 *?. 10 thc art fo f an a PP ar ? ,us an ? 

apparatus that adaptively adjusts a quantizer scale for each 10 adaptively adjusts a quantizer scale for each 

macroblock within a frame to maintain the overall quality of ™ croblock ^ ■ -fr™ 10 *??**m loe overall quahty of 

the motion video while optimizing the coding rate. *" moUo ° «*» wmJe opUmizmg the coding rate. 

BACKGROUND OF THE INVENTION SUMMARY OF THE INVENTION 

The Moving Picture Experts Group (MPEG) created the 15 The prescQt invention is a method and apparatus for 

ISO/IEC international Standards 11172 and 13818 selecting a quantizer scale for each block, e.g., a 

(generally referred to as MPEG-1 and MPEG-2 format macroblock, within each frame to maintain the overall 

respectively) to establish a standard for coding/decoding quality of the video image while optimizing the coding rate, 

strategies. Although these MPEG standards specify a gen- Namely, a quantizer scale is selected for each macroblock 

eral coding methodology and syntax for generating an within cach frame (picture) such that the target bit rate for 

MPEG compliant bitstream, many variations are permitted the picture is achieved while maintaining a uniform visual 

to accommodate a plurality of different applications and quahty over the entire frame, 
services such as desktop video publishing, video 

conferencing, digital storage media and television broadcast. BRIEF DESCRIPTION OF THE DRAWINGS 

In the area of rate control, MPEG docs not define a 25 „. ... f , . . ... 

specific method for controlling the bit rate of an encoder. It ^ 6 te " h ' ngS of . J*" 5 . P re f nt f J™*™ can , *\ r Y 

is the task of the encoder designer to devise a rate control understood bv considering the following detailed descnp- 

process for controlling the bifrate such that the decoder «J ™>— Wth the ^panyuig drawings, in 
input buffer neither overflows nor underflows. 

Currently, one way of controlling the bit rate is to alter the 30 FIG 1 Urates a block diagram of the apparatus of the 

quantization process, which wUl affect the distortion of thc P rese o l invention; 

input video image. By altering the quantizer scale (step FIG - 2 illustrates a block diagram of a flowchart of a 

size), the bit rate can be changed and controlled. method for deriving and allocating the target bit rate for an 

Although changing the quantizer scale is an effective 35 based on blocks within & c 

method of implementing the rate control of an encoder, it has FIG. 3 illustrates a flowchart of a method for determining 

been shown that a poor rate control process will actually a target frame bit rate; 

degrade the visual quality of thc video image, i.e., failing to FIG. 4 illustrates a flowchart of a method for determining 

alter thc quantizer scale in an efficient manner such that it is one or more target macroblock bit rates for the macroblocks 

necessary to drastically alter the quantizer scale toward the ^ within the current image; 

end of a picture to avoid overflow and underflow conditions. FIG. 5 illustrates a block diagram of a second embodi- 

Smce altering the quantizer scale affects both image quality men( of me ap p aratus 0 f tD e present invention; 

and compression efficiency, it is important for a rate control , . ... , , . . 

ill- -l • FIG. 6 is a graphical representation of a wavelet tree; and 

process to control the bit rate without sacrificing image 6 F F ' 

q Ua jity 45 FIG. 7 dlustrates an encoding system of the present 

In thc current MPEG coding strategies (e.g., various invention. 

MPEG test models), the quantizer scale for each frame is T° facilitate understanding, identical reference numerals 

selected by assuming that all the pictures of the same type have been used, where possible, to designate identical 

have identical complexity within a group of pictures. elements that are common to the figures. 

However, the quantizer scale selected by this criterion may 50 nPTATi Pn nFsrRiPnnu 

not achieve optimal coding performance, since thc complex- Uh l AlLhU UfcoLKlrl lUN 

ity of each picture will vary with lime. FIG. 1 depicts a block diagram of the apparatus 100 of the 

Furthermore, encoders that utilize global-type transforms, present invention for deriving a quantizer scale for each 

e.g., wavelet transform (otherwise known as hierarchical macroblock within each frame to maintain the overall qual- 

subband decomposition), have similar problems. For 55 ity of thc video image while controlling the coding rate, 

example, wavelet transforms are applied to an important Although thc present invention is described below with 

aspect of low bit rate image coding: the coding of a binary reference to a MPEG compliant encoder, those skilled in thc 

map (a wavelet tree) indicating the locations of the non-zero art will realize that the present invention can be adapted to 

values, otherwise known as the significance map of the other encoders that are compliant with other coding/ 

transform coefficients. Quantization and entropy coding are 60 decoding standards. 

then used to achieve very low bit rates. It follows that a In the preferred embodiment of the present invention, the 

significant improvement in the proper selection of a quan- apparatus 100 is an encoder or a portion of a more complex 

tizer scale for encoding the significance map (the wavelet block-based motion compensation coding system. The appa- 

trce) will translate into a significant improvement in com- ratus 100 comprises a motion estimation module 140, a 

prcssion efficiency and coding rate. 6 $ motion compensation module 150, a rate control module 

Furthermore, rate control can be implemented at lower 130, a DCT module 160, a quantization (Q) module 170, a 

levels within a frame e.g., at the macroblock or block levels. variable length coding (VLC) module 180, a buffer 190, an 
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inverse quantization (Q -1 ) module 175, an inverse DCT 
(DCT -1 ) transform module 165, a subtracter 115 and a 
summer 155. Although the apparatus 100 comprises a plu- 
rality of modules, those skilled in the art will realize that the 
functions performed by the various modules are not required 
to be isolated into separate modules as shown in FIG. 1. For 
example, the set of modules comprising the motion com- 
pensation module 150, inverse quantization module 175 and 
inverse DCT module 165 is generally known as an "embed- 
ded decoder". 

FIG. 1 illustrates an input image (image sequence) on 
signal path 110 which is digitized and represented as a 
luminance and two color difference signals (Y, C r , C^) in 
accordance with the MPEG standards. These signals are 
further divided into a plurality of layers (sequence, group of 
pictures, picture, slice, macroblock and block) such that each 
picture (frame) is represented by a plurality of macroblocks. 
Each macroblock comprises four (4) luminance blocks, one 
C r block and one C b block where a block is defined as an 
eight (8) by eight (8) sample array. The division of a picture 
into block units improves the ability to discern changes 
between two successive pictures and improves image com- 
pression through the elimination of low amplitude trans- 
formed coefficients (discussed below). The digitized signal 
may optionally undergo preprocessing such as format con- 
version for selecting an appropriate window, resolution and 
input format. 

The following disclosure uses the MPEG standard termi- 
nology; however, it should be understood that the term 
macroblock or block is intended to describe a block of pixels 
of any size or shape. Broadly speaking, a "macroblock" 
could be as small as a single pixel, or as large as an entire 
video frame. 

The input image on path 110 is received into motion 
estimation module 140 for estimating motion vectors. A 
motion vector is a two-dimensional vector which is used by 
motion compensation to provide an offset from the coordi- 
nate position of a block in the current picture to the coor- 
dinates in a reference frame. The reference frames can be a 
previous frame (P-frame), or previous and/or future frames 
(B-fraraes). The use of motion vectors greatly enhances 
image compression by reducing the amount of information 
that is transmitted on a channel because only the changes 
between the current and reference frames are coded and 
transmitted. 

The motion vectors from the motion estimation module 
140 are received by the motion compensation module 150 
for improving the efficiency of the prediction of sample 
values. Motion compensation involves a prediction that uses 
motion vectors to provide offsets into the past and/or future 
reference frames containing previously decoded sample 
values that are used to form the prediction error. Namely, the 
motion compensation module 150 uses the previously 
decoded frame and the motion vectors to construct an 
estimate of the current frame. Furthermore, those skilled in 
the art will realize that the functions performed by the 
motion estimation module and the motion compensation 
module can be implemented in a combined module, e.g., a 
single block motion compensator. 

Furthermore, prior to performing motion compensation 
prediction for a given macroblock, a coding mode must be 
selected. In the area of coding mode decision, MPEG 
provides a plurality of different macroblock coding modes. 
Specifically, MPEG-2 provides macroblock coding modes 
which include intra mode, do motion compensation mode 
(No MC), frame/field/dual -prime motion compensation inter 
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mode, forward/backward/average inter mode and field/ 
frame DCT mode. 

Once a coding mode is selected, motion compensation 
module 150 generates a motion compensated prediction 

5 (predicted image) on path 152 of the contents of the block 
based on past and/or future reference pictures. This motion 
compensated prediction on path 152 is subtracted via sub- 
tractor 115 from the video image on path 110 in the current 
macroblock to form an error signal or predictive residual 

10 signal on path 153. The formation of the predictive residual 
signal effectively removes redundant information in the 
input video image. It should be noted that if a current frame 
is encoded as an I-frame, then the signal on path 153 is 
simply the original picture and not a predictive residual 

15 signal. 

The DCT module 160 then applies a forward discrete 
cosine transform process to each block of the predictive 
residual signal to produce a set of eight (8) by eight (8) block 
of DCT coefficients. The DCT basis function or subband 

20 decomposition permits effective use of psycho visual criteria 
which is important for the next step of quantization. 

The resulting 8x8 block of DCT coefficients is received 
by quantization module 170 where the DCT coefficients are 

25 quantized. The process of quantization reduces the accuracy 
with which the D CP coefficients are represented by dividing 
the DCT' coefficients by a set of quantization values with 
appropriate rounding to form integer values. The quantiza- 
tion values can be set individually for each DCT coefficient, 

30 using criteria based on the visibility of the basis functions 
(known as visually weighted quantization). Namely, the 
quantization value corresponds to the threshold for visibility 
of a given basis function, i.e., the coefficient amplitude that 
is just detectable by the human eye. By quantizing the DCT 

35 coefficients with this value, many of the DCT coefficients are 
converted to the value "zero", thereby improving image 
compression efficiency. The process of quantization is a key 
operation and is an important tool to achieve visual quality 
and to control the encoder to match its output to a given bit 

AQ rate (rale control). Since a different quantization value can 
be applied to each DCT coefficient, a "quantization matrix" 
is generally established as a reference table, e.g., a lumi- 
nance quantization table or a chrominance quantization 
table. Thus, the encoder chooses a quantization matrix that 

45 determines bow each frequency coefficient in the trans- 
formed block is quantized. 

However, subjective perception of quantization error 
greatly varies with the frequency and it is advantageous to 
use coarser quantization values for the higher frequencies. 

50 Namely, human perceptual sensitivity of quantization errors 
are lower for the higher spatial frequencies. As a result, high 
frequencies are quantized more coarsely with fewer allowed 
values than low frequencies. Furthermore, an exact quanti- 
zation matrix depends on many external parameters such as 

55 the characteristics of the intended display, the viewing 
distance and the amount of noise in the source. Thus, it is 
possible to tailor a particular quantization matrix for an 
application or even for an individual sequence of frames. 
Generally, a customized quantization matrix can be stored as 

60 context together with the compressed video image. The 
proper selection of a quantizer scale is performed by the rate 
control module 130. 

Next, the resulting 8x8 block of quantized DCT coeffi- 
cients is received by variable length coding (VLC) module 

65 180 via signal connection 171, where the two-dimensional 
block of quantized coefficients is scanned in a "zig-zag" 
order to convert it into a one-dimensional string of quantized 
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DCT coefficients. This zig-zag scanning order is an approxi- 
mate sequential ordering of the DCT coefficients from the 
lowest spatial frequency to the highest. Variable length 
coding (VLC) module 180 then encodes the string of quan- 
tized DCT coefficients and all side-information for the 
macroblock using variable length coding and run-length 
coding. 

The data stream is received into a "First In-First Out" 
(FIFO) buffer 190. A consequence of using different picture 
types and variable length coding is that the overall bit rate 
into the FIFO is variable. Namely, the number of bits used 
to code each frame can be different. In applications that 
involve a fixed-rate channel, a FIFO buffer is used to match 
the encoder output to the channel for smoothing the bit rate. 
Thus, the output signal of FIFO buffer 190 on path 195 is a 
compressed representation of the input video image on path 
110 (or a compressed difference signal between the input 
image and a predicted image), where it is sent to a storage 
medium or telecommunication channel via path 195. 



30 



IS 



each frame. With this knowledge, a quantizer scale is 
calculated for the frame in accordance with a complexity 
measure having a polynomial form. This complexity mea- 
sure is derived to meet the constraint that the selected 
quantizer scale for the frame should approach the target bit 
rate for the picture. Once the frame is encoded, the rate 
control module recursively adjusts the complexity measure 
through the use of a polynomial regression process. That is, 
the actual number of bits necessary to code the macroblock 
is used to refine the complexity measure so as to improve the 
prediction of a quantizer scale for the next frame. In the 
course of computing the quantizer scale, the "target frame 
bit rate" is also recursively updated. This frame rate allo- 
cating method was disclosed in patent application entitled 
"Apparatus And Method For Optimizing The Rate Control 
In A Coding System", filed on Feb. 11, 1998, with Ser. No. 
09/022349 which is incorporated herein by reference. It 
should be understood that the present invention can be 
implemented using other frame bit rate allocating methods, 
e.g., frame bit rate allocating methods that are based on a 



The rate control module 130 serves to monitor and adjust 20 distortion measure and the like. 



the bit rate of the data stream entering the FIFO buffer 190 
to prevent overflow and underflow on the decoder side 
(within a receiver or target storage device, not shown) after 
transmission of the data stream. Thus, it is the task of the rate 
control module 130 to monitor the status of buffer 190 to 
control the number of bits generated by the encoder. 

In the preferred embodiment of the present invention, rate 
control module 130 selects a quantizer scale for each block, 
e.g., a macroblock within each frame to maintain the overall 
quality of the video image while controlling the coding rate. 
Namely, a frame can be evaluated to determine if certain 
blocks within the frame require more or less bit rate allo- 
cation. It has been observed that for different applications, 
various blocks arc of more interest than other blocks, e.g., 
the face of a person in a video phone application is more 
important to a human viewer than the background in general. 
Other examples include medical applications, where certain 
blocks of an image, i.e., a potential tumor is more important 
than the surrounding tissues or in surveillance applications, 
where certain blocks of an image, i.e., a military assess is 
more important than the surrounding camouflage, and so on. 
Thus, the particular application will dictate the criteria that 
define the importance of relevant blocks within a frame. In 
the present invention, a quantizer scale is selected for each 
macroblock within each frame such that target bit rate for the 
frame is achieved while maintaining a uniform visual quality 
over the entire frame. 

It should be understood that although the present inven- 
tion is described with an encoder implementing temporal 
(e.g., motion estimation/compensation) and spatial encoding 
(e.g., discreet cosine transform), the present invention is not 
so limited. Other temporal and spatial encoding methods can 
be used, including no use of any temporal and spatial 
encoding. 

Specifically, the rate control module 130 comprises a 
frame rate allocator 131 and a macroblock rate allocator 132. 
The frame rate allocator 131 allocates a bit budget (target 
frame bit rate) for a current frame, whereas the macroblock 



25 



rate allocator 132 allocates a bit budget (target macroblock 60 readily available. 



In brief, the macroblock rate allocator 132 then applies the 
calculated target frame bit rate to determine one or more 
target macroblock bit rates, where the bits of the target frame 
bit rate are distributed proportional to the mean of the 
absolute differences (MAD) and the weighting for a mac- 
roblock. A detailed description of the target macroblock bit 
rate and corresponding quantizer scale selection method is 
discussed below with reference to FIG. 4. 
However, due to human visual responses, some macrob- 
30 locks may be deemed to be more important than other 
macroblock to a human viewer. The importance of a mac- 
roblock is determined by an optional macroblock classifying 
module 120. The macroblock classifying module 120 con- 
tains the necessary criteria to define the importance of the 
macroblocks within each frame. Various macroblock clas- 
sifying methods are available, e.g., as disclosed in patent 
application entitled "Apparatus And Method For Employing 
M-Ary Pyramids To Enhance Feature-Based Classification 
And Motion Estimation", filed on Dec. 31, 1997, with Ser. 
No. 09/002,258, which is incorporated herein by reference. 
The "importance" of a macroblock is accounted through the 
use of weighting as described below. In brief, if a macrob- 
lock is very important, then more bits are allocated to the 
macroblock, whereas if a macroblock is not very important, 
theo less bits are allocated to the macroblock. 

Alternatively, the "macroblock based" information, e.g., 
which macroblocks are more important, which macroblocks 
carry what type of information, e.g., foreground, 
background, or objects in a frame and the like, can be 
obtained directly from the image sequence on path 112. 
Namely, if the image sequence was previously processed 
and stored on a storage medium, e.g., a stored video 
sequence or program on a server, then it is possible that the 
encoder that generated the stored video sequence may pass 
along "macroblock based" information. In other words, 
"macroblock based" information can be transmitted to the 
encoder 100 along with the image sequence. In such 
implementation, the macroblock classifying module 120 can 
be omitted, since the macroblock based information is 



35 



40 



45 



50 



bit rate or target block bit rate) for each macroblock within 
the current frame. 

In brief, the frame rate allocator 131 initially obtains a 
rough estimate of the complexity of a specific type of picture 
(I, P, B) from previously encoded pictures or by implement- 
ing various MPEG test models. This estimated complexity is 
used to derive a predicted number of bits necessary to code 



65 



Returning to FIG. 1, the resulting 8x8 block of quantized 
DCT coefficients from the quantization module 170 is also 
received by the inverse quantization module 175 via signal 
connection 172. At this stage, the encoder regenerates 
I-frames and P- frames of the input video image by decoding 
the data so that they are used as reference frames for 
subsequent encoding. 
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The resulting dequantized 8x8 block of DCT coefficients for the sequence (or segment), e.g., 24000 bits/sec). Namely, 

are passed to the inverse DCT module 165 where inverse a lower bound of target rate (R/50) is used to maintain or 

DCT is applied to each macroblock to produce the decoded guarantee a minimal quality, e.g., 800 bits/frame can be set 

error signal. This error signal is added back to the prediction ^ a minimum. If the minimal quality cannot be maintained, 

signal from the motion compensation module via summer 5 foe encoder has the option to skip the current frame alto- 

155 to produce a decoded reference picture (reconstructed gcthcr Method 300 then ends in step 340. 

""JE?*- r « It should be understood that other frame bit rate allocation 

, J?F' 2 1 de .P lcts a b ] oc * diagram of a flowchart of a method melhods caD ^ ^ ^ MPEG M and TM5> wilh 

200 for deriving and allocating bits for an image based on me m invemioQ . However, since the target macroblock 

macroblocks withij .the : image. More speoflcaUy, method ^ are Jerived ^ ^ f bft h 

200 starts m step 205 and proceeds to step 210 where a target t.i;„„.i „„„,i„, a ^ ,u a f„ „ f ,1 . 

frame bit rate is determined for a current frame. In the computational overhead and the accuracy of the target 

preferred embodiment, the target frame bit rate is deter- *f "©block bit rates are affected by the frame bit rate 

mined using a complexity measure that is recursively Nation method that is employed. For example, if it is 

adjusted through the use of 'a polynomial regression process desirable l ° minimize computational complexity at the 

(as illustrated in FIG. 3). 15 cx P eose of performance, then it may be appropriate to 

FIG. 3 illustrates a flowchart of a method 300 for deter- cm P lo y the frame bit ratc allocation methods of MPEG TM4 

mining a target frame bit rate. Referring to FIG. 3, the and ™ 5 - In contrast, if it is desirable to maximize perfor- 

method begins at step 305 and proceeds to step 310, where mance at the expense of increasing computational 

the method determines the target bits (target frame bit rate) complexity, then it may be appropriate to employ the frame 

for a frame, T fnm * as: 20 bit rate allocation method of the patent application Ser. No. 

09/022349, or other more complex frame bit rate allocation 

j? (i) methods. 

W = Wf * ( i - past^crccnt) + T pmiau /JW x past^percent Returning to FIG. 2, once T /n5m , is determined, method 

200 then determines one or more target macroblock bit rates 

„ . , . . . „ . . , 25 for the macroblocks within the current image as illustrated 

where R is the remaining number of bits for a sequence of in HG 4 Mqw Melhod 200 ^ Cflds fa m 

frame, Nf is the number of remaining frames in the HQ 4 iUustrtlcs a flowchart of a method 400 for deter- 

sequence, T _ f is the number of bits used for m ^ Q ^ Q{ more macroblock bil rales for lhc 

encoding the previous frame and me past_percent is a macro51ocks within lhe 

current image. The method starts in 

constant. In he preferred embodiment, the constant past_ 30 m and dg tQ m whfire a sum of absolute 

percent is selected to be 0.05. However, the present mven- difference (SAD) § fe rformed for eacn macroblock j. 

tion is not so limited Other values can be employed that Namd ^ difference betweefl each ^ va]ue 

depend on the specific applications or the context of the ^ ori ^ nal ^ } and ^ ^ ondin pixel value (in the 

images. In fact, these values can be adjusted temporally ^ ; } fa rformed for ixcls dcfined wahin the 

In sum equation (1) aUows the target frame bit rate to be 35 macroblock Next> me sum of all the absohuc diffcrcnces of 

computed based on the bits availab e and the last encoded me ixek for me macroblock ^ potf6naed t0 gcncrate the 

frame bits. If the last frame is complex and uses many bits, f or {bc mac^io^ { 

it leads to the premise that more bits should be assigned to . „. A ~ n ... Ailil '_ • „ e • . .u 

. r ii .u • j 11 .- n In step 420, method 400 queries whether S, is greater than 

the cunent frame However, this increased allocation will ^ ^ ^ remove 

dimimsh the avaUable number of bite for encoding the ^ various macroblocks from the * m macroblock bit allo . 

remaining frames, thereby limiting the increased allocation catiofl method Na . { ^ ^ informa(ion in 

o this frame. A weighted average reflects a compromise of macroblocks ^ ^ clim ^ led or rcduced t0 

tiw r 25 ,n 5600 m in ihr ° ugh eiiher spatiai fiuermg ° r ^ aQ,ization - sincc tbese 

equa ion ( ). macroblocks will be deemed to carry no information, bits 

In step .320, method I30C then adjusts the calculated target 45 ^ ^ ^ ^ ma J^ kjL ^ lhe th ^ h . 

frame bit rate, T /ram , by the current buffer fullness as: qM ^ ^ ^ eliminate various macrobIocks from con . 

sideration. 

7 ^" wr = r°x C +f>) x r ■' >,,,,, ' In °P eralwn > tne threshold H ? b selected as the average of 

cxa+ a ll the mean absolute difference (MADs) that have been 

50 skipped in the previous frame, where MAI),- is defined as the 
where T frame is the adjusted target bit rate, "a" is the current S f divided by the number of pixels in the macroblock i. 
buffer fullness (the portion of the buffer that contains bits to However, if the current frame is the first frame in the image 
be sent to the decoder), and b is (the physical buffer sequence, then H, is set to be half of the average of all the 
size— buffer fullness (a)), and c is a constant selected to be existing MADs in this current frame, 
a value of 2 (other values can be used). As such "b" 55 Thus, if the query at step 420 is answered negatively, then 
represents the remaining space in the buffer. Equation (2) method 400 proceeds to step 425, where the present mac- 
indicates that if the buffer is more than half full, the adjusted roblock i is removed from consideration for receiving bit 
target bit rate T framr is decreased. Conversely, if the buffer allocation. If the query at step 420 is answered positively, 
is less than half full, the adjusted target bit rate T framt is then method 400 proceeds to step 423, where the present 
increased. If the buffer is exactly at half, no adjustment is 60 macroblock i will be considered for receiving bit allocation, 
necessary, sincc equation (2) reduces to TV^-Ta^. as discussed below. 

In step 330, method 300 then optionally verifies that a In step 430, target macroblock bit rate, R„ is determined 

lower bound of target frame bit rate (R/*0) is maintained as: for each macroblock i (i-1, 2, 3 . . . ) as follows: 

^■MuWW (3) 

It should be noted that equation (3) allows T fram , to take the 
greater (max) of two possible values, where R, is a bit rate 



65 



K'zKiXTfi^ (4) 
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-continued 'iKV^Jfl 

= for Mad, having SAD k >H 9 (5) ^ CO 

« R, represents the bit rate for the sequence (or segment), e.g., 

5 24000 bits per second. N f represents the distance between 

where Mad,, is the mean absolute difference (MAD) of an encoded frames. Namely, due to low bit rate apphcations 

macroblock i, V is the number of macroblocks in a frame, certain * amcs ™ ttuD \ sequence may not be encoded 

"w" is a weighting factor and R, is the estimated target ( skl PPed), e.g the encoder may only encode every fourth 

macroblock bit rate for macroblock i. The weighting factor 10 framc ' II bc understood that the number of skipped 

w allows Rj for a given macroblock to be adjusted in frames can ** tai,orcd to ^ requirement of a particular 

accordance with other criteria. Namely, it has been observed application. Thus, in step 440, the quantizer scale Q f can be 

that some macroblocks can be viewed as being more impor- determined in accordance with equations (6) and (7). 

tant than other macroblocks, where ''importance" is not However, the calculated Q,- is limited by the condition that 

based solely on the MAD of an macroblock. Various appli- it should not be varied too significantly from macroblock to 

cations as illustrated above, may place emphasis on certain 15 macroblock. Namely Q,- is limited by the conditions: 
macroblocks. By incorporating an optional weighting factor 

w, more or less bits can be allocated to a particular mac- if 0<<0?/«*i-«). lhen 0.-((?a»,-«). 

roblock based upon application specific criteria. In the ( n tn \ \. t \ <a 

preferred embodiment, w is set to a value of "1", but other lf a> « 3 '«' 4< >' *" Q><Qu«+*\ (8) 

values can be used as necessary for a particular application. 20 ^ fe ^ ^ ^ for ^ {qus 

Thus, the targe, frame bits TV are distnbu^d propor- macroblock havi an R Md < <e> . ^ a constaQt to be 

tional to the mean of the absolute differences (MAD) of a . «i» /^,uZ .. i u a a a- .u 

macroblock. For example, if an image has only four (4) 3 V ?! Ue ,. 2 , ^ ^ ? ^ ^ 1 g fl 

macroblocks "a", "b", "c" and "d", having Mad fl =l, Mad fc - application). Namely, the calculated Q is limited by a 

1, Mad c =3 and Mad^-4, respectively, T^-100 and H is 25 change to be no greater than a value of "2 from macroblock 

such that macroblocks a and b are removed from consider- t0 macroblock. This limitation maintains uniform visual 

ation for receiving bit allocation, then, R c and R d are given W a]i ly, i.e., minimizing significant changes in visual quality 

respectively as: fr° m macroblock to macroblock. It should be noted that if an 

immediate macroblock is skipped (a macroblock without an 

3w 4vv , Rf), then Q las , is based on the next immediate "non-skipped" 

«' = ^kr d x 100 ttd R « ■ 3^4^ x(10 ° " macroblock. 

In step 450, the Q t is used to encode the macroblock to 
generate an "actual R" (the actual number of bits used to 

Thus, the MADs of macroblocks a and b are not used in the encode the macroblock). After actual encoding, the actual R- 

determination of R c and R d . Furthermore, the above example anc j me Q,. are used to update the parameters X, and X, using 

illustrates the need to update T /mmr after each R, is allocated 35 a polynomial regression model or a quadratic regression 

to a macroblock. mcK fel t0 re fi ne the parameters Xi and X 2 . Namely, the 

Once R,. for each macroblock in the image is determined, constants X, and X 2 are updated to account for the discrep- 

method 400, in step 440 calculates a quantization scale Q,. ancy between the bits allocated to an macroblock and the 

for each macroblock i. In the preferred embodiment, the ac t U al number of bits needed to the code the macroblock for 

quantization scale Q, is calculated in accordance with a 40 a particular quantizer level or scale. Regression models are 

distortion measure as described in U.S. patent application we u known in the art. For a detailed discussion of various 

with Ser. No. 09/022,349. In brief, the quantizer scale Q, is regression models, see e.g., Bowerman and O'Connell, 

derived from a quadratic rate-distortion method as expressed Forecasting and Time Series, 3rd Edition, Duxbury Press, 

below: 45 (1993, chapter 4). 

Furthermore, T fmme is then updated by subtracting the 

R, = X,E ; Qr l + X 2 E;Ql 2 where £ tf, = J^, ( 6 ) actual R,: 

i 

7>.„«-r /rw -actuaL R, (9) 

where R, is the actual bits used for macroblock i, Q t - 50 When the entire frame is encoded, the actual T frvme (the 
represents a quantization level or scale selected for the actual number of bits used to encode the frame) can then be 
macroblock i, E,- represents a distortion measure. In the used to update the method that is tasked with generating a 
preferred embodiment, E, represents a sum of all the mean target frame bit rate, e.g. as illustrated in FIG. 3 above, 
absolute differences (MADs) for those macroblocks having Although the above embodiment employs R, to compute 
R,. for the current frame. Namely, the measure E, provides a 55 a quantizer scale for a macroblock, other coding parameters 
method of adjusting the macroblock bit budget to account such as allocation of computing resources can be imple- 
for the differences in the macroblock between successive mented. Namely, if it is determined that a particular mac- 
frames in a sequence. In other words, the greater the roblock or a series of macroblocks has a high R„ then it is 
differences between an macroblock in the current frame and possible to allocate more processing power, e.g., dedicating 
the same macroblock in a previous frame, the greater the 60 more processors in a multiple processors coding system in 
number of bits that will be required to code the macroblock coding a particular macroblock or a series of macroblocks. 
in the current frame. Furthermore, other distortion measures In step 470, method 400 queries whether there is a next 
can be used, such that E, may represent mean square error macroblock in the frame. If the query is negatively 
or just-noticeable difference (jnd). answered, then method 400 ends in step 480. If the query is 
During initialization, R, in equation (6) is substituted with 65 positively answered, then method 400 returns to step 430, 
the calculated R, to generate Q f . The parameters Xj and X 2 where steps 430-460 are repeated until all the macroblocks 
are initialized as follows: are evaluated in the current frame. 



07/08/2004, EAST Version: 1.4.1 



US 6,690,833 Bl 



11 



12 



FIG. 5 depicts a wavelet-based encoder 500 that incor- 
porates the present invention. The encoder contains a block 
motion compensator (BMC) and motion vector coder 504, 
sub tractor 502, discrete wavelet transform (DWT) coder 
506, bit rate controller 510, DWT decoder 512 and output 5 
buffer 514. 

In general, as discussed above the input signal is a video 
image (a two-dimensional array of pixels (pels) defining a 
frame in a video sequence). To accurately transmit the image 
through a low bit rate channel, the spatial and temporal 10 
redundancy in the video frame sequence must be substan- 
tially reduced. This is generally accomplished by coding and 
transmitting only the differences between successive frames. 
The encoder has three functions: first, it produces, using the 
BMC and its coder 504, a plurality of motion vectors that 15 
represent motion that occurs between frames; second, it 
predicts the present frame using a reconstructed version of 
the previous frame combined with the motion vectors; and 
third, the predicted frame is subtracted from the present 
frame to produce a frame of residuals that are coded and 20 
transmitted along with the motion vectors to a receiver. 

The discrete wavelet transform performs a wavelet hier- 
archical subband decomposition to produce a conventional 
wavelet tree representation of the input image. To accom- 
plish such image decomposition, the image is decomposed 25 
using times two subsampling into high horizontal-high ver- 
tical (HH), high horizontal-low vertical (HL), low 
horizontal-high vertical (LH), and low horizontal-low ver- 
tical (LL), frequency subbands. The LL subband is then 
further subsampled times two to produce a set of HH, HL, 30 
LH and LL subbands. This subsampling is accomplished 
recursively to produce an array of subbands such as that 
illustrated in FIG. 6 where three subsamplings have been 
used. Preferably six subsamplings are used in practice. The 
parent-child dependencies between subbands are illustrated 35 
as arrows pointing from the subband of the parent nodes to 
the subbands of the child nodes. The lowest frequency 
subband is the top left LL ]t and the highest frequency 
subband is at the bottom right HH 3 . In this example, all child 
nodes have one parent. A detailed discussion of subband 40 
decomposition is presented in J. M. Shapiro, "Embedded 
Image Coding Using Zerotrees of Wavelet Coefficients", 
IEEE Trans, on Signal Processing, Vol. 41, No. 12, pp. 
3445-62, December 1993. 

The DWT coder of FIG. 5 codes the coefficients of the 45 
wavelet tree in either a "breadth first" or "depth first" 
pattern. A breadth first pattern traverse the wavelet tree in a 
bit-plane by bit-plane pattern, i.e., quantize all parent nodes, 
then all children, then all grandchildren and so on. In 
contrast, a depth first pattern traverses each tree from the 50 
root in the low-low subband (LLj) through the children (top 
down) or children through the low-low subband (bottom 
up). The selection of the proper quantization level by the rale 
controller 510 is as discussed above to control the bit rate for 
each macroblock within each frame of a sequence. As such, 55 
the present invention can be adapted to various types of 
encoders that use different transforms. 

FIG. 7 illustrates an encoding system 700 of the present 
invention. The encoding system comprises a general pur- 
pose computer 710 and various input/output devices 720. 60 
The general purpose computer comprises a central process- 
ing unit (CPU) 712, a memory 714 and an encoder 716 for 
receiving and encoding a sequence of images. 

In the preferred embodiment, the encoder 716 is simply 
the encoder 100 and/or encoder 500 as discussed above. The 65 
encoder 716 can be a physical device which is coupled to the 
CPU 712 through a communication channel. Alternatively, 



the encoder 716 can be represented by a software application 
(or a combination of software and hardware, e.g., applica- 
tion specific integrated circuits (ASIC)) which is loaded 
from a storage device and resides in Uie memory 712 of the 
computer. As such, the encoder 100 and 500 of the present 
invention can be stored on a computer readable medium, 
e.g., a memory or storage device. 

The computer 710 can be coupled to a plurality of input 
and output devices 720, such as a keyboard, a mouse, a 
camera, a camcorder, a video monitor, any number of 
imaging devices or storage devices, including but not lim- 
ited to, a tape drive, a floppy drive, a hard disk drive or a 
compact disk drive. The input devices serve to provide 
inputs to the computer for producing the encoded video 
bitstreams or to receive the sequence of video images from 
a storage device or an imaging device. Finally, a commu- 
nication channel 730 is shown where the encoded signal 
from the encoding system is forwarded to a decoding system 
(not shown). 

There has thus been shown and described a novel appa- 
ratus and method that selects a quantizer scale for each 
macroblock within each frame to maintain the overall qual- 
ity of the video image while optimizing the coding rate. 
Many changes, modifications, variations and other uses and 
applications of the subject invention will, however, become 
apparent to those skilled in the art after considering this 
specification and the accompanying drawings which dis- 
close the embodiments thereof. All such changes, 
modifications, variations and other uses and applications 
which do not depart from the spirit and scope of the 
invention are deemed to be covered by the invention. 

What is claimed is: 

1. A method for allocating bits to encode each frame of an 
image sequence, each of said frame having at least one 
block, said method comprising the steps of: 

(a) determining a target frame bit rate for the frame; and 

(b) allocating said target frame bit rate among the at least 
one block in accordance with a target block bit rate for 
the at least one block, wherein said target block bit rate 
for the at least one block is selected in accordance with 
a mean absolute difference (Mad) of said block. 

2. The method of claim 1, wherein said target block bit 
rate is determined in accordance with: 



Mad,w-, 

£ Mad,** 
i-i 



for Madt having 5* Di > 



where Mad is the mean absolute difference (MAD) of a 
block, n is a number of blocks in the frame, w is a 
weighting factor and R 4 - is said target block bit rate, Sad 
is a sum of absolute difference (SAD) of a block, T \f rtant 
is said target frame bit rale and H q is a constant. 

3. The method of claim 1, wherein said target block bit 
rate is adjusted in accordance with a threshold H fl . 

4. The method of claim 3, wherein said target block bit 
rate is adjusted by removing each block within me frame 
having a sum of absolute difference (SAD) that is less than 
said threshold H^ from said step (b) of allocating said target 
frame bit rate among the at least one block. 

5. The method of claim 1, wherein said block is a 
macroblock. 

6. The method of claim 1, wherein said target frame bit 
ral e » Tjhn*** *s derived in accordance with: 
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14. The apparatus of claim 10, wherein said block is a 

R macroblock. 

Tpmm = Nf * (1 " paslJwtnI)+ xpatt.pcteai 15 -j^ a p params 0 f c i aim i 0 , wherein said target block 

bit rate is derived from a target frame bit rate. 

. „ . . . . c , . . 5 16. The apparatus of claim 15, wherein said target frame 
where R is a remaining number of bits for the image ... . T rr . . . . . , ' ... . 
sequence, Nf is a number of remaining frames in the image blt ralc ' W 15 dcnvcd 10 accordance with, 
sequence, T /wrwie<1 , /raw is a number of bits used for encod- 
ing a previous frame, and past_percent is a constant. TfKmt = jL x (t - pasuKrcem) + T prrvioms ^ x p»i .percent 

7. The method of claim 1, further comprising the step of: ^ 

(c) generating a quantizer scale for said at least one block 10 

in accordance with said target block bit rate whcre R b a remaining number of bits for a of 

8. The method of claim 7, further comprising the step of: framcSi Nf ^ a Qumbcr of rcmaining frames m thc sequence, 
(C) adjusting said quantizer scale in accordance with a T . ^ a number of bils ^ for encoding a 

previous quanuzer scale of a previous block £^ ous f ra and m fe a conslanl 

9. Thc method of claim 7, further comprising the step of: 15 A , . . . . 

.. . i . . ... ... ., 17. A computer-readable medium having stored thereon a 

(d) encoding said at least one block with said quantizer , ... c . . - . ... c • . • i j 
scale plurality of instructions, the plurality of instructions mclud- 

10. Apparatus for encoding each frame of an image «W ^mictions which, when executed by a processor, cause 
sequence, said frame having at least one block, said appa- lhe processor to perform the steps comprising of: 

ratus comprising: 20 (a) determining a target frame bit rate for the frame; and 

a motion compensator for generating a predicted image of (b) allocating said target frame bit rate among the at least 

a current frame; one block in accordance with a target block bit rate for 

a transform module for applying a transformation to a the at least one block, wherein said target block bit rate 

difference signal between the current frame and said for the at least one block is selected in accordance with 

predicted image, where said transformation produces a 25 a mean absolute difference (Mad) of said block. 

plurality of coefficients; 18. The computer-readable medium of claim 17, wherein 

a quantizer for quantizing said plurality of coefficients said target block bit rate is determined in accordance with: 

with at least one quantizer scale; and 

a controller for selectively adjusting said at least one 3Q R^KixT^ 

quantizer scale for a current frame in response to a 

target block bit rate for the at least one block, wherein K, = „ — for Madt having SAD k > H q 

said target block bit rate for the at least one block is Z 



selected in accordance with a mean absolute difference 

(Mad) of said block. 35 

11. The apparatus of claim 10, wherein said target block whefe Mad k the meaQ absohlte (tAAD) of a 
bit rate for each of a plurality of blocks is selected m block> „ u a numbef of blocks - m the frame> w fc a 
accordance with: weighting factor and R, is said target block bit rate, Sad 

is a sum of absolute difference (SAD) of a block, 'V rrome 

Ri = KtxTfn*, ^ is said target frame bit rate and H fl is a constant. 

Mad w _ 19. The computer-readable medium of claim 17, wherein 

*< = - '-t— tor Madt having SAD k > H q said target block bit rate is adjusted in accordance with a 

£ threshold 

20. The computer-readable medium of claim 17, wherein 
45 said target frame bit rate, Tf mme , is derived in accordance 

where Mad is the mean absolute difference (MAD) of a with: 
block, n is a number of blocks in thc frame, w is a 

weighting factor and R f is said target block bit rate, Sad r 

is a sum of absolute difference (SAD) of a block, T fnmut 7>— = — x(l- past_percent) + T pmlM ^ x past.pcrccnt 

is said target frame bit rate and W q is a constant. so 

12. The apparatus of claim 10, wherein said target mac- 
roblock bit rate is adjusted in accordance with a threshold where R is a remaining number of bits for the image 
H . sequence, Nf is a number of remaining frames in the image 

*13. The apparatus of claim 12, wherein said target block sequence, l pnvious frome is a number of bits used for encod- 

bit rate is adjusted by removing each block within the frame S5 ing a previous frame, and past percent is a constant, 
having a sum of absolute difference (SAD) that is less than 

said threshold H^. * * * * * 
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